Mapping a vulnerability to a stage of an attack chain taxonomy

ABSTRACT

In an embodiment, a semantic model and a semantic model training method that obtains a textual description of one or more features associated with a first vulnerability that has been used in one or more attacks. Text is parsed from the first textual description in accordance with one or more rules. The system determines a first label for the first vulnerability that is associated with one or more of a plurality of stages of an attack chain taxonomy. The model is generated or refined to map the parsed text to the first label associated with the one or more stages of the attack chain taxonomy.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application for patent is a Continuation of U.S. Provisionalapplication Ser. No. 16/880,198, entitled “MAPPING A VULNERABILITY TO ASTAGE OF AN ATTACK CHAIN TAXONOMY,” filed May 21, 2020, assigned to theassignee hereof, and expressly incorporated herein by reference in itsentirety.

TECHNICAL FIELD

The various aspects and embodiments described herein generally relate tomapping a vulnerability to stage(s) of an attack chain taxonomy.

BACKGROUND

Identifying vulnerabilities (CVEs) that are actively exploited or maypotentially be exploited by attackers and understanding how avulnerability can enable the attacker at each stage of the attack lifecycle is absolutely critical for vulnerability assessments, designingrisk models for a vulnerability management system, and understandingattacker actions in a given environment.

Given that no CVE is easily classified into an attack chain taxonomy andthe volume of vulnerabilities disclosed, defenders lack a concreteapproach to prioritize CVEs based on their role in the attack chain andin the context of controls in place. Knowing the intrusion technique fora given CVE, defenders can assess the risk of the CVE based on the stageat which attackers are using the CVE, and deploy controls to monitor forthe intrusions. Furthermore, once the intrusion technique is known, adefender can group techniques by tactics to prioritize vulnerabilitiesfor patching.

However, there is presently no source for such use case information forthe many thousands of CVEs reported every year, and the manual effortinvolved in such classification results in networks being exposed tomany of these CVEs.

SUMMARY

The following presents a simplified summary relating to one or moreaspects and/or embodiments disclosed herein. As such, the followingsummary should not be considered an extensive overview relating to allcontemplated aspects and/or embodiments, nor should the followingsummary be regarded to identify key or critical elements relating to allcontemplated aspects and/or embodiments or to delineate the scopeassociated with any particular aspect and/or embodiment. Accordingly,the following summary has the sole purpose to present certain conceptsrelating to one or more aspects and/or embodiments relating to themechanisms disclosed herein in a simplified form to precede the detaileddescription presented below.

In an embodiment, a semantic model obtains at least one first textualdescription of one or more features associated with a firstvulnerability that has been used in one or more attacks and parses textfrom the at least one first textual description in accordance with oneor more rules. The semantic model then determines at least one firstlabel for the first vulnerability that is associated with one or more ofa plurality of stages of an attack chain taxonomy. From thisdetermination, the semantic model is refined or generated, the semanticmodel mapping the parsed text to the at least one first label associatedwith the one or more stages of the attack chain taxonomy.

In an embodiment, the at least one first label is inserted into a jointlabel space and at least one second label related to one or moreintrusion techniques is also inserted into the joint label space. Thesemantic model generates at least one technique label based on labels inthe joint label space. The determination of the at least one first labelfor the first vulnerability is based on context extracted from theparsed text. The generation of the at least one technique label is basedon a distance function between the at least one second label and the atleast one first label.

In an embodiment, a method that begins by obtaining at least one textualdescription of one or more features associated with a vulnerabilityand/or exploit and parsing text from the at least one textualdescription in accordance with one or more rules. A model, such as thetrained semantic model, is obtained that maps textual data to labels forthe one or more features of the vulnerability and/or exploit torespective stages of an attack chain taxonomy. The model maps the parsedtext to at least one first label for the first vulnerability associatedwith one or more stages of the attack chain taxonomy in accordance withthe model.

Other objects and advantages associated with the aspects and embodimentsdisclosed herein will be apparent to those skilled in the art based onthe accompanying drawings and detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete appreciation of the various aspects and embodimentsdescribed herein and many attendant advantages thereof will be readilyobtained as the same becomes better understood by reference to thefollowing detailed description when considered in connection with theaccompanying drawings which are presented solely for illustration andnot limitation, and in which:

FIG. 1 illustrates an exemplary network having various assets that canbe managed using a vulnerability management system, according to variousaspects.

FIG. 2 illustrates another exemplary network having various assets thatcan be managed using a vulnerability management system, according tovarious aspects.

FIG. 3 illustrates a server in accordance with an embodiment of thedisclosure.

FIG. 4 illustrates an exemplary process for creating and applying avulnerability characterization model an enterprise network in accordancewith an embodiment of the disclosure.

FIG. 5 illustrates a system for characterizing vulnerabilities inaccordance with an embodiment of the disclosure.

FIG. 6 illustrates a schematic of a system for characterizingvulnerabilities in accordance with an embodiment of the disclosure.

FIG. 7A illustrates a context encoder of the system in accordance withan embodiment of the disclosure.

FIG. 7B illustrates a label encoder of the system in accordance with anembodiment of the disclosure.

FIG. 7C illustrates a transform network of the system in accordance withan embodiment of the disclosure.

FIG. 8 illustrates a process for generating a model in accordance withan embodiment of the disclosure.

FIG. 9 illustrates a process for applying a model to new vulnerabilitiesin accordance with an embodiment of the disclosure.

FIG. 10 illustrates a process for a sequence of stages for building themodel in accordance with an embodiment of the disclosure.

FIG. 11 illustrates a process for a sequence of stages for building themodel in accordance with an embodiment of the disclosure.

FIG. 12 illustrates a process for a sequence of stages for applying themodel in accordance with an embodiment of the disclosure.

FIGS. 13A-13B illustrates a mapping of labels to a two-dimensional spacerepresenting attack techniques and attack taxonomy in accordance with anembodiment of the disclosure.

DETAILED DESCRIPTION

Various aspects and embodiments are disclosed in the followingdescription and related drawings to show specific examples relating toexemplary aspects and embodiments. Alternate aspects and embodimentswill be apparent to those skilled in the pertinent art upon reading thisdisclosure, and may be constructed and practiced without departing fromthe scope or spirit of the disclosure. Additionally, well-known elementswill not be described in detail or may be omitted so as to not obscurethe relevant details of the aspects and embodiments disclosed herein.

The word “exemplary” is used herein to mean “serving as an example,instance, or illustration.” Any embodiment described herein as“exemplary” is not necessarily to be construed as preferred oradvantageous over other embodiments. Likewise, the term “embodiments”does not require that all embodiments include the discussed feature,advantage, or mode of operation.

The terminology used herein describes particular embodiments only andshould not be construed to limit any embodiments disclosed herein. Asused herein, the singular forms “a,” “an,” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. Those skilled in the art will further understand that theterms “comprises,” “comprising,” “includes,” and/or “including,” as usedherein, specify the presence of stated features, integers, steps,operations, elements, and/or components, but do not preclude thepresence or addition of one or more other features, integers, steps,operations, elements, components, and/or groups thereof.

Further, various aspects and/or embodiments may be described in terms ofsequences of actions to be performed by, for example, elements of acomputing device. Those skilled in the art will recognize that variousactions described herein can be performed by specific circuits (e.g., anapplication specific integrated circuit (ASIC)), by program instructionsbeing executed by one or more processors, or by a combination of both.Additionally, these sequences of actions described herein can beconsidered to be embodied entirely within any form of non-transitorycomputer-readable medium having stored thereon a corresponding set ofcomputer instructions that upon execution would cause an associatedprocessor to perform the functionality described herein. Thus, thevarious aspects described herein may be embodied in a number ofdifferent forms, all of which have been contemplated to be within thescope of the claimed subject matter. In addition, for each of theaspects described herein, the corresponding form of any such aspects maybe described herein as, for example, “logic configured to” and/or otherstructural components configured to perform the described action.

As used herein, the term “asset” and variants thereof may generallyrefer to any suitable uniquely defined electronic object that has beenidentified via one or more preferably unique but possibly non-uniqueidentifiers or identification attributes (e.g., a universally uniqueidentifier (UUID), a Media Access Control (MAC) address, a Network BIOS(NetBIOS) name, a Fully Qualified Domain Name (FQDN), an InternetProtocol (IP) address, a tag, a CPU ID, an instance ID, a Secure Shell(SSH) key, a user-specified identifier such as a registry setting, filecontent, information contained in a record imported from a configurationmanagement database (CMDB), etc.). For example, the various aspects andembodiments described herein contemplate that an asset may be a physicalelectronic object such as, without limitation, a desktop computer, alaptop computer, a server, a storage device, a network device, a phone,a tablet, a wearable device, an Internet of Things (IoT) device, aset-top box or media player, etc. Furthermore, the various aspects andembodiments described herein contemplate that an asset may be a virtualelectronic object such as, without limitation, a cloud instance, avirtual machine instance, a container, etc., a web application that canbe addressed via a Uniform Resource Identifier (URI) or Uniform ResourceLocator (URL), and/or any suitable combination thereof. Those skilled inthe art will appreciate that the above-mentioned examples are notintended to be limiting but instead are intended to illustrate theever-evolving types of resources that can be present in a moderncomputer network. As such, the various aspects and embodiments to bedescribed in further detail below may include various techniques tomanage network vulnerabilities according to an asset-based (rather thanhost-based) approach, whereby the various aspects and embodimentsdescribed herein contemplate that a particular asset can have multipleunique identifiers (e.g., a UUID and a MAC address) and that aparticular asset can have multiples of a given unique identifier (e.g.,a device with multiple network interface cards (NICs) may have multipleunique MAC addresses). Furthermore, as will be described in furtherdetail below, the various aspects and embodiments described hereincontemplate that a particular asset can have one or more dynamicidentifiers that can change over time (e.g., an IP address) and thatdifferent assets may share a non-unique identifier (e.g., an IP addresscan be assigned to a first asset at a first time and assigned to asecond asset at a second time). Accordingly, the identifiers oridentification attributes used to define a given asset may vary withrespect to uniqueness and the probability of multiple occurrences, whichmay be taken into consideration in reconciling the particular asset towhich a given data item refers. Furthermore, in the elastic licensingmodel described herein, an asset may be counted as a single unit ofmeasurement for licensing purposes.

According to various aspects, FIG. 1 illustrates an exemplary network100 having various assets 130 that are interconnected via one or morenetwork devices 140 and managed using a vulnerability management system150. More particularly, as noted above, the assets 130 may includevarious types, including traditional assets (e.g., physical desktopcomputers, servers, storage devices, etc.), web applications that runself-supporting code, Internet of Things (IoT) devices (e.g., consumerappliances, conference room utilities, cars parked in office lots,physical security systems, etc.), mobile or bring-your-own-device (BYOD)resources (e.g., laptop computers, mobile phones, tablets, wearables,etc.), virtual objects (e.g., containers and/or virtual machineinstances that are hosted within the network 100, cloud instances hostedin off-site server environments, etc.). Those skilled in the art willappreciate that the assets 130 listed above are intended to be exemplaryonly and that the assets 130 associated with the network 100 may includeany suitable combination of the above-listed asset types and/or othersuitable asset types. Furthermore, in various embodiments, the one ormore network devices 140 may include wired and/or wireless accesspoints, small cell base stations, network routers, hubs, spanned switchports, network taps, choke points, and so on, wherein the networkdevices 140 may also be included among the assets 130 despite beinglabelled with a different reference numeral in FIG. 1 .

According to various aspects, the assets 130 that make up the network100 (including the network devices 140 and any assets 130 such as cloudinstances that are hosted in an off-site server environment or otherremote network 160) may collectively form an attack surface thatrepresents the sum total of resources through which the network 100 maybe vulnerable to a cyberattack. As will be apparent to those skilled inthe art, the diverse nature of the various assets 130 make the network100 substantially dynamic and without clear boundaries, whereby theattack surface may expand and contract over time in an oftenunpredictable manner thanks to trends like BYOD and DevOps, thuscreating security coverage gaps and leaving the network 100 vulnerable.For example, due at least in part to exposure to the interconnectednessof new types of assets 130 and abundant software changes and updates,traditional assets like physical desktop computers, servers, storagedevices, and so on are more exposed to security vulnerabilities thanever before. Moreover, vulnerabilities have become more and more commonin self-supported code like web applications as organizations seek newand innovative ways to improve operations. Although delivering customapplications to employees, customers, and partners can increase revenue,strengthen customer relationships, and improve efficiency, these customapplications may have flaws in the underlying code that could expose thenetwork 100 to an attack. In other examples, IoT devices are growing inpopularity and address modern needs for connectivity but can also addscale and complexity to the network 100, which may lead to securityvulnerabilities as IoT devices are often designed without security inmind. Furthermore, trends like mobility, BYOD, etc. mean that more andmore users and devices may have access to the network 100, whereby theidea of a static network with devices that can be tightly controlled islong gone. Further still, as organizations adopt DevOps practices todeliver applications and services faster, there is a shift in howsoftware is built and short-lived asses like containers and virtualmachine instances are used. While these types of virtual assets can helporganizations increase agility, they also create significant newexposure for security teams. Even the traditional idea of a perimeterfor the network 100 is outdated, as many organizations are connected tocloud instances that are hosted in off-site server environments,increasing the difficulty to accurately assess vulnerabilities,exposure, and overall risk from cyberattacks that are also becoming moresophisticated, more prevalent, and more likely to cause substantialdamage.

Accordingly, to address the various security challenges that may arisedue to the network 100 having an attack surface that is substantiallyelastic, dynamic, and without boundaries, the vulnerability managementsystem 150 may include various components that are configured to helpdetect and remediate vulnerabilities in the network 100.

More particularly, the network 100 may include one or more activescanners 110 configured to communicate packets or other messages withinthe network 100 to detect new or changed information describing thevarious network devices 140 and other assets 130 in the network 100. Forexample, in one implementation, the active scanners 110 may performcredentialed audits or uncredentialed scans to scan certain assets 130in the network 100 and obtain information that may then be analyzed toidentify potential vulnerabilities in the network 100. Moreparticularly, in one implementation, the credentialed audits may includethe active scanners 110 using suitable authentication technologies tolog into and obtain local access to the assets 130 in the network 100and perform any suitable operation that a local user could performthereon without necessarily requiring a local agent. Alternativelyand/or additionally, the active scanners 110 may include one or moreagents (e.g., lightweight programs) locally installed on a suitableasset 130 and given sufficient privileges to collect vulnerability,compliance, and system data to be reported back to the vulnerabilitymanagement system 150. As such, the credentialed audits performed withthe active scanners 110 may generally be used to obtain highly accuratehost-based data that includes various client-side issues (e.g., missingpatches, operating system settings, locally running services, etc.). Onthe other hand, the uncredentialed audits may generally includenetwork-based scans that involve communicating packets or messages tothe appropriate asset(s) 130 and observing responses thereto in order toidentify certain vulnerabilities (e.g., that a particular asset 130accepts spoofed packets that may expose a vulnerability that can beexploited to close established connections). Furthermore, as shown inFIG. 1 , one or more cloud scanners 170 may be configured to perform asubstantially similar function as the active scanners 110, except thatthe cloud scanners 170 may also have the ability to scan assets 130 likecloud instances that are hosted in a remote network 160 (e.g., anoff-site server environment or other suitable cloud infrastructure).

Additionally, in various implementations, one or more passive scanners120 may be deployed within the network 100 to observe or otherwiselisten to traffic in the network 100, to identify further potentialvulnerabilities in the network 100, and to detect activity that may betargeting or otherwise attempting to exploit previously identifiedvulnerabilities. In one implementation, as noted above, the activescanners 110 may obtain local access to one or more of the assets 130 inthe network 100 (e.g., in a credentialed audit) and/or communicatevarious packets or other messages within the network 100 to illicitresponses from one or more of the assets 130 (e.g., in an uncredentialedscan). In contrast, the passive scanners 120 may generally observe (or“sniff”) various packets or other messages in the traffic traversing thenetwork 100 to passively scan the network 100. In particular, thepassive scanners 120 may reconstruct one or more sessions in the network100 from information contained in the sniffed traffic, wherein thereconstructed sessions may then be used in combination with theinformation obtained with the active scanners 110 to build a model ortopology describing the network 100. For example, in one implementation,the model or topology built from the information obtained with theactive scanners 110 and the passive scanners 120 may describe anynetwork devices 140 and/or other assets 130 that are detected oractively running in the network 100, any services or client-sidesoftware actively running or supported on the network devices 140 and/orother assets 130, and trust relationships associated with the variousnetwork devices 140 and/or other assets 130, among other things. In oneimplementation, the passive scanners 120 may further apply varioussignatures to the information in the observed traffic to identifyvulnerabilities in the network 100 and determine whether any data in theobserved traffic potentially targets such vulnerabilities. In oneimplementation, the passive scanners 120 may observe the network trafficcontinuously, at periodic intervals, on a pre-configured schedule, or inresponse to determining that certain criteria or conditions have beensatisfied. The passive scanners 120 may then automatically reconstructthe network sessions, build or update the network model, identify thenetwork vulnerabilities, and detect the traffic potentially targetingthe network vulnerabilities in response to new or changed information inthe network 100.

In one implementation, as noted above, the passive scanners 120 maygenerally observe the traffic traveling across the network 100 toreconstruct one or more sessions occurring in the network 100, which maythen be analyzed to identify potential vulnerabilities in the network100 and/or activity targeting the identified vulnerabilities, includingone or more of the reconstructed sessions that have interactive orencrypted characteristics (e.g., due to the sessions including packetsthat had certain sizes, frequencies, randomness, or other qualities thatmay indicate potential backdoors, covert channels, or othervulnerabilities in the network 100). Accordingly, the passive scanners120 may monitor the network 100 in substantially real-time to detect anypotential vulnerabilities in the network 100 in response to identifyinginteractive or encrypted sessions in the packet stream (e.g.,interactive sessions may typically include activity occurring throughkeyboard inputs, while encrypted sessions may cause communications toappear random, which can obscure activity that installs backdoors orrootkit applications). Furthermore, in one implementation, the passivescanners 120 may identify changes in the network 100 from the encryptedand interactive sessions (e.g., an asset 130 corresponding to a newe-commerce server may be identified in response to the passive scanners120 observing an encrypted and/or interactive session between a certainhost located in the remote network 160 and a certain port that processeselectronic transactions). In one implementation, the passive scanners120 may observe as many sessions in the network 100 as possible toprovide optimal visibility into the network 100 and the activity thatoccurs therein. For example, in one implementation, the passive scanners120 may be deployed at any suitable location that enables the passivescanners 120 to observe traffic going into and/or out of one or more ofthe network devices 140. In one implementation, the passive scanners 120may be deployed on any suitable asset 130 in the network 100 that runs asuitable operating system (e.g., a server, host, or other device thatruns Red Hat Linux or FreeBSD open source operating system, a UNIX,Windows, or Mac OS X operating system, etc.).

Furthermore, in one implementation, the various assets andvulnerabilities in the network 100 may be managed using thevulnerability management system 150, which may provide a unifiedsecurity monitoring solution to manage the vulnerabilities and thevarious assets 130 that make up the network 100. In particular, thevulnerability management system 150 may aggregate the informationobtained from the active scanners 110 and the passive scanners 120 tobuild or update the model or topology associated with the network 100,which may generally include real-time information describing variousvulnerabilities, applied or missing patches, intrusion events,anomalies, event logs, file integrity audits, configuration audits, orany other information that may be relevant to managing thevulnerabilities and assets in the network 100. As such, thevulnerability management system 150 may provide a unified interface tomitigate and manage governance, risk, and compliance in the network 100.

According to various aspects, FIG. 2 illustrates another exemplarynetwork 200 with various assets 230 that can be managed using avulnerability management system 250. In particular, the network 200shown in FIG. 2 may have various components and perform substantiallysimilar functionality as described above with respect to the network 100shown in FIG. 1 . For example, in one implementation, the network 200may include one or more active scanners 210 and/or cloud scanners 270,which may interrogate assets 230 in the network 200 to build a model ortopology of the network 200 and identify various vulnerabilities in thenetwork 200, one or more passive scanners 220 that can passively observetraffic in the network 200 to further build the model or topology of thenetwork 200, identify further vulnerabilities in the network 200, anddetect activity that may potentially target or otherwise exploit thevulnerabilities. Additionally, in one implementation, a log correlationengine 290 may be arranged to receive logs containing events fromvarious sources distributed across the network 200. For example, in oneimplementation, the logs received at the log correlation engine 290 maybe generated by internal firewalls 280, external firewalls 284, networkdevices 240, assets 230, operating systems, applications, or any othersuitable resource in the network 200. Accordingly, in oneimplementation, the information obtained from the active scanners 210,the cloud scanners 270, the passive scanners 220, and the logcorrelation engine 290 may be provided to the vulnerability managementsystem 250 to generate or update a comprehensive model associated withthe network 200 (e.g., topologies, vulnerabilities, assets, etc.).

In one implementation, the active scanners 210 may be strategicallydistributed in locations across the network 200 to reduce stress on thenetwork 200. For example, the active scanners 210 may be distributed atdifferent locations in the network 200 in order to scan certain portionsof the network 200 in parallel, whereby an amount of time to perform theactive scans may be reduced. Furthermore, in one implementation, one ormore of the active scanners 210 may be distributed at a location thatprovides visibility into portions of a remote network 260 and/oroffloads scanning functionality from the managed network 200. Forexample, as shown in FIG. 2 , one or more cloud scanners 270 may bedistributed at a location in communication with the remote network 260,wherein the term “remote network” as used herein may refer to theInternet, a partner network, a wide area network, a cloudinfrastructure, and/or any other suitable external network. As such, theterms “remote network,” “external network,” “partner network,” and“Internet” may all be used interchangeably to suitably refer to one ormore networks other than the networks 100, 200 that are managed usingthe vulnerability management systems 150, 250, while references to “thenetwork” and/or “the internal network” may generally refer to the areasthat the systems and methods described herein may be used to protect orotherwise manage. Accordingly, in one implementation, limiting theportions in the managed network 200 and/or the remote network 260 thatthe active scanners 210 are configured to interrogate, probe, orotherwise scan and having the active scanners 210 perform the scans inparallel may reduce the amount of time that the active scans consumebecause the active scanners 210 can be distributed closer to scanningtargets. In particular, because the active scanners 210 may scan limitedportions of the network 200 and/or offload scanning responsibility tothe cloud scanners 270, and because the parallel active scans may obtaininformation from the different portions of the network 200, the overallamount of time that the active scans consume may substantiallycorrespond to the amount of time associated with one active scan.

As such, in one implementation, the active scanners 210 and/or cloudscanners 270 may generally scan the respective portions of the network200 to obtain information describing vulnerabilities and assets in therespective portions of the network 200. In particular, the activescanners 210 and/or cloud scanners 270 may perform the credentialedand/or uncredentialed scans in the network in a scheduled or distributedmanner to perform patch audits, web application tests, operating systemconfiguration audits, database configuration audits, sensitive file orcontent searches, or other active probes to obtain informationdescribing the network. For example, the active scanners 210 and/orcloud scanners 270 may conduct the active probes to obtain a snapshotthat describes assets actively running in the network 200 at aparticular point in time (e.g., actively running network devices 240,internal firewalls 280, external firewalls 284, and/or other assets230). In various embodiments, the snapshot may further include anyexposures that the actively running assets to vulnerabilities identifiedin the network 200 (e.g., sensitive data that the assets contain,intrusion events, anomalies, or access control violations associatedwith the assets, etc.), configurations for the actively running assets(e.g., operating systems that the assets run, whether passwords forusers associated with the assets comply with certain policies, whetherassets that contain sensitive data such as credit card informationcomply with the policies and/or industry best practices, etc.), or anyother information suitably describing vulnerabilities and assetsactively detected in the network 200. In one implementation, in responseto obtaining the snapshot of the network 200, the active scanners 210and/or cloud scanners 270 may then report the information describing thesnapshot to the vulnerability management system 250, which may use theinformation provided by the active scanners 210 to remediate andotherwise manage the vulnerabilities and assets in the network.

Furthermore, in one implementation, the passive scanners 220 may bedistributed at various locations in the network 200 to monitor traffictraveling across the network 200, traffic originating within the network200 and directed to the remote network 260, and traffic originating fromthe remote network 260 and directed to the network 200, therebysupplementing the information obtained with the active scanners 210. Forexample, in one implementation, the passive scanners 220 may monitor thetraffic traveling across the network 200 and the traffic originatingfrom and/or directed to the remote network 260 to identifyvulnerabilities, assets, or information that the active scanners 210 maybe unable to obtain because the traffic may be associated withpreviously inactive assets that later participate in sessions on thenetwork. Additionally, in one implementation, the passive scanners 220may be deployed directly within or adjacent to an intrusion detectionsystem sensor 215, which may provide the passive scanners 220 withvisibility relating to intrusion events or other security exceptionsthat the intrusion detection system (IDS) sensor 215 identifies. In oneimplementation, the IDS may be an open source network intrusionprevention and detection system (e.g., Snort), a packet analyzer, or anyother system that having a suitable IDS sensor 215 that can detect andprevent intrusion or other security events in the network 200.

Accordingly, in various embodiments, the passive scanners 220 may sniffone or more packets or other messages in the traffic traveling across,originating from, or directed to the network 200 to identify new networkdevices 240, internal firewalls 280, external firewalls 284, or otherassets 230 in addition to open ports, client/server applications, anyvulnerabilities, or other activity associated therewith. In addition,the passive scanners 220 may further monitor the packets in the trafficto obtain information describing activity associated with web sessions,Domain Name System (DNS) sessions, Server Message Block (SMB) sessions,File Transfer Protocol (FTP) sessions, Network File System (NFS)sessions, file access events, file sharing events, or other suitableactivity that occurs in the network 200. In one implementation, theinformation that the passive scanners 220 obtains from sniffing thetraffic traveling across, originating from, or directed to the network200 may therefore provide a real-time record describing the activitythat occurs in the network 200. Accordingly, in one implementation, thepassive scanners 220 may behave like a security motion detector on thenetwork 200, mapping and monitoring any vulnerabilities, assets,services, applications, sensitive data, and other information that newlyappear or change in the network 200. The passive scanners 220 may thenreport the information obtained from the traffic monitored in thenetwork to the vulnerability management system 250, which may use theinformation provided by the passive scanners 220 in combination with theinformation provided from the active scanners 210 to remediate andotherwise manage the network 200.

In one implementation, as noted above, the network 200 shown in FIG. 2may further include a log correlation engine 290, which may receive logscontaining one or more events from various sources distributed acrossthe network 200 (e.g., logs describing activities that occur in thenetwork 200, such as operating system events, file modification events,USB device insertion events, etc.). In particular, the logs received atthe log correlation engine 290 may include events generated by one ormore of the internal firewalls 280, external firewalls 284, networkdevices 240, and/or other assets 230 in the network 200 in addition toevents generated by one or more operating systems, applications, and/orother suitable sources in the network 200. In one implementation, thelog correlation engine 290 may normalize the events contained in thevarious logs received from the sources distributed across the network200, and in one implementation, may further aggregate the normalizedevents with information describing the snapshot of the network 200obtained by the active scanners 210 and/or the network traffic observedby the passive scanners 220. Accordingly, in one implementation, the logcorrelation engine 290 may analyze and correlate the events contained inthe logs, the information describing the observed network traffic,and/or the information describing the snapshot of the network 200 toautomatically detect statistical anomalies, correlate intrusion eventsor other events with the vulnerabilities and assets in the network 200,search the correlated event data for information meeting certaincriteria, or otherwise manage vulnerabilities and assets in the network200.

Furthermore, in one implementation, the log correlation engine 290 mayfilter the events contained in the logs, the information describing theobserved network traffic, and/or the information describing the snapshotof the network 200 to limit the information that the log correlationengine 290 normalizes, analyzes, and correlates to information relevantto a certain security posture (e.g., rather than processing thousands ormillions of events generated across the network 200, which could take asubstantial amount of time, the log correlation engine 290 may identifysubsets of the events that relate to particular intrusion events,attacker network addresses, assets having vulnerabilities that theintrusion events and/or the attacker network addresses target, etc.).Alternatively (or additionally), the log correlation engine 290 maypersistently save the events contained in all of the logs to comply withregulatory requirements providing that all logs must be stored for acertain period of time (e.g., saving the events in all of the logs tocomply with the regulatory requirements while only normalizing,analyzing, and correlating the events in a subset of the logs thatrelate to a certain security posture). As such, the log correlationengine 290 may aggregate, normalize, analyze, and correlate informationreceived in various event logs, snapshots obtained by the activescanners 210 and/or cloud scanners 270, and/or the activity observed bythe passive scanners 220 to comprehensively monitor, remediate, andotherwise manage the vulnerabilities and assets in the network 200.Additionally, in one implementation, the log correlation engine 290 maybe configured to report information relating to the information receivedand analyzed therein to the vulnerability management system 250, whichmay use the information provided by the log correlation engine 290 incombination with the information provided by the passive scanners 220,the active scanners 210, and the cloud scanners 270 to remediate ormanage the network 200.

Accordingly, in various embodiments, the active scanners 210 and/orcloud scanners 270 may interrogate any suitable asset 230 in the network200 to obtain information describing a snapshot of the network 200 atany particular point in time, the passive scanners 220 may continuouslyor periodically observe traffic traveling in the network 200 to identifyvulnerabilities, assets, or other information that further describes thenetwork 200, and the log correlation engine 290 may collect additionalinformation to further identify the vulnerabilities, assets, or otherinformation describing the network 200. The vulnerability managementsystem 250 may therefore provide a unified solution that aggregatesvulnerability and asset information obtained by the active scanners 210,the cloud scanners 270, the passive scanners 220, and the logcorrelation engine 290 to comprehensively manage the network 200.

Security auditing applications typically display security issues (suchas vulnerabilities, security misconfigurations, weaknesses, etc.) pairedwith a particular solution for that given issue. Certain security issuesmay share a given solution, or have solutions which are superseded orotherwise rendered unnecessary by other reported solutions. Embodimentsof the disclosure relate to improving an efficiency by which securityissues are reported, managed and/or rectified based on solutionsupersedence.

In accordance with a first embodiment, when working with securityreporting datasets with sparse metadata available, the reportedsolutions for each security issue are combined, and various “rulesets”are applied against the combined solutions to de-duplicate them andremove solutions that have been superseded by other solutions. As usedherein, a ruleset is a set of rules that govern when a solution is to beremoved or merged with another and how that merge is to be accomplished.In an example, when solution texts not matching a given ruleset arediscovered they are flagged for manual review. Examples of rules thatmay be included in one or more rulesets are as follows:

-   -   If there is more than one matching solution in the solution        list, remove all but one of those solutions.    -   For solutions matching “Upgrade to <product> x.y.z” where x, y,        and z are integers, select a single result with the highest        x.y.z value (comparing against x first, then y, then z).    -   For solutions matching “Apply fix <fix> to <product>”, create a        new combined solution where <fix> for each solution is        concatenated into a comma separated list for a given <product>.

In accordance with a second embodiment, when working with datasets withmetadata available that have an identifier that allows grouping ofsolutions based on product (e.g., common product enumeration (CPE)) andtimestamp information on when a fix has become available, the solutionsfor each group can be filtered with only display the latest “top level”solution for each group being displayed. In an example, the first andsecond embodiments can be implemented in conjunction with each other toproduce a further refined solution set.

As used herein, a “plug-in” contains logic and metadata for anindividual security check in a security auditing application. A pluginmay check for one or more mitigations/fixes and flag one or moreindividual security issues. CPE is a standardized protocol of describingand identifying classes of applications, operating systems, and hardwaredevices present among an enterprise's computing assets. CPE identifierscontain asset type information (OS/Hardware/Application), vendor,product, and can even contain version information. An example CPE stringis “cpe:/o:microsoft:windows_vista:6.0:sp1”, where “/o” stands foroperating system, Microsoft is the vendor, windows_vista is the product,major version is 6.0, and minor version is SP1. Further, a commonvulnerabilities and exposures (CVE) identifier is an identifier from anational database maintained by NIST/Mitre which keeps a list of knownvulnerabilities and exposures. An example identifier would be“CVE-2014-6271” which corresponds to the “Shell Shock” vulnerability inthe database.

In accordance with one implementation of the second embodiment,solutions (or solution ‘texts’) may first together based on the CPEs inthe plugins they were reported in. The solutions are then sorted by thepatch publication date from the plugins which they were sourced from.Solutions containing text that matches a pattern that indicates that thesolution is likely a patch recommendation can all be removed from thegroup except the solution associated with the most recent patch. In thismanner, patches with identifiers that cannot be easily sorted (e.g.,patches with non-numerical identifiers) and/or for which no rulesetpertains in accordance with the first embodiment can be filtered outfrom the solution set. In some implementations, additional ruleset-basedfiltering from the first embodiment can also be applied, to filter out(or de-duplicate) additional duplicate solution information.

In accordance with a third embodiment, a security auditing applicationmay evaluate further metadata in the solution report results that isadded based upon asset-specific information (e.g., such as individualpatches installed, which mitigations and patches are missing, whatindividual software installations are installed, patch supersedenceinformation, the relationship between the mitigations/patches andsecurity issues, etc.).

The various embodiments may be implemented on any of a variety ofcommercially available server devices, such as server 300 illustrated inFIG. 3 . In an example, the server 300 may correspond to one exampleconfiguration of a server on which a security auditing application mayexecute, which in certain implementations may be included as part of thevulnerability management system 150 of FIG. 1 or the vulnerabilitymanagement system 250 of FIG. 2 . In FIG. 3 , the server 300 includes aprocessor 301 coupled to volatile memory 302 and a large capacitynonvolatile memory, such as a disk drive 403. The server 300 may alsoinclude a floppy disc drive, compact disc (CD) or DVD disc drive 306coupled to the processor 301. The server 300 may also include networkaccess ports 304 coupled to the processor 301 for establishing dataconnections with a network 307, such as a local area network coupled toother broadcast system computers and servers or to the Internet.

FIG. 4 illustrates a process 400 provides for continuous improvement ofthe functionality and performance of the vulnerability management system250 in accordance with an embodiment of the disclosure. In some designs,the process 400 is advantageously automated so that a manual extractionof descriptions of vulnerabilities is not needed. Furthermore, thesystem may be able to automatically prioritize various vulnerabilitiesfor correction without human intervention. The details of this processare illustrated further in FIG. 8-12 .

At 410, the vulnerability management system 250 receives a CVEdescription or intrusion report, generates intrusion reports for pastattacks, or records mitigation techniques taken by the vulnerabilitymanagement system 250 in response to a breach. This information may bereceived from external databases (e.g., GOOGLE project zero). Any datathat is parsed from CVE-related sources in this manner is broadlydescribed herein as a CVE “feature”.

At 420, the vulnerability management system 250 characterizes and labelseach CVE or recorded intrusion in an automated CVE characterizationdevice, such as computing device 501, in accordance with a model. Asused herein, a CVE “label” is determined based on its associated CVEfeatures, and is used to characterize the attack chain taxonomy (e.g.,ATT&CK) stage(s) associated with the CVE. Examples of how the model isgenerated and refined (or trained) are described in more detail below.

At 430, the vulnerability management system 250 predicts CVE uses andattack techniques using the model, prioritizes CVE fixes based on thesystem setup and other CVEs, or predicts or suggests mitigationtechniques for the vulnerability management system to use to address theCVE exploit. These predicted features of each CVE add to the knowledgedatabase of the vulnerability management system 150/250 (i.e., the modelis continually trained or refined based on new data). Thus, withouthuman intervention, the system and/or model is capable of discovering orpredicting possible uses and mitigation strategies based only on a CVEdescription.

In at least one embodiment, the vulnerability management system 250further includes a computing device 501 as in FIG. 5 for analyzing CVEsand fitting the CVEs into possible attack sequences. In some designs,the computing device 501 may be a commercial server device asillustrated in FIG. 3 or may be a dedicated device or ASIC that isembedded in the network 200. The computing device 501 may connect toexternal database(s) 508 to receive CVE information, exploit reports,and/or network logs. The computing device 501 may also transmit analysisdata to the external database(s) 508 to assist the vulnerabilitymanagement system in identifying and prioritizing CVEs.

The computing device 501 may include a data manipulator 502 thatprovides digital storage space for structured and unstructured data aswell as data processing capabilities for data analysis. The datamanipulator 502 may include many nodes and connections in a hierarchicalor layered structure to facilitate mapping of data points to each other.For example, the connections may be ordered via a convolutional neuralnetwork, a recurrent neural network, or other neural network operated bythe data manipulator. Specifically, in at least one embodiment, the datamanipulator may perform sorts, filters, comparisons, correlations,similarity determinations, and/or other data analysis.

The data manipulator 502 as illustrated in FIG. 5 may include a jointlatent space 506 that stores data and a context encoder 503, a labelencoder 504, and a transform network 505. The context encoder 503 mayfeed the joint latent space 506 with data objects encoded, extracted orcharacterized by the context encoder 503. The label encoder 504 may feedthe joint latent space 506 with data objects encoded, extracted orcharacterized by the label encoder 504. The transform network 505 mayperform additional data analysis on the data objects encoded by thecontext encoder 503 and the label encoder 504. The transformer network505 may receive data objects and reprocess them back to the joint latentspace 506 with additional or new embeddings. The computing device 501may include a Multi-Layer Perceptron (MLP) classifier 507 that operateson the joint latent space 506 and arranges the data objects of the jointlatent space 506. In addition, the MLP classifier 507 may output dataobjects as results to the external database(s) 508. These results may beused by the vulnerability management system 250.

Intrusion techniques comprise the actions that adversaries (orattackers) attempt to perform to accomplish goals and are the foundationof the vulnerability model. Adversarial Tactics, Techniques & CommonKnowledge (ATT&CK) is one example of an attack chain taxonomy developedby MITRE. The aim of ATT&CK as defined by MITRE is to categorizeadversary behavior to help improve post-compromise detection of advancedintrusions. Software vulnerabilities (CVEs) play an important role incyber-intrusions, and are mostly classified into four ATT&CK techniques,which cover the exploitation phases (or stages) of the attack chain.

The context encoder 503, the label encoder 504, and the transformnetwork 505 are embedding modules illustrated in more detail in FIG. 6 .Specifically, a labeling and filtering pipeline 600 connects the contextencoder 503, the label encoder 504, and the transform network 505 foroutput to the joint latent space 506. The context encoder 503 isconnected to the label encoder 504 at a combination node 1007. The labelencoder 504 and the transform network are connected at anothercombination node 1009. Each of the nodes connecting the data processors503, 504, 505 also connects to the central node 1008 which transmitsdata to the joint latent space 506. Each of the nodes may combine,connect, or filter the outputs of the context encoder 503, the labelencoder 504, and the transform network 505 according to one or morerules or algorithms.

The context encoder 503 receives features and descriptions 601 fromparsers of unstructured data. The unstructured data may include CVEdescriptions, exploit reports, zero days, leaked or auctioned data, andintrusions detected by the passive scanners 120 or the active scanners110. The label encoder 504 receives word and character tokens 602 whichare also generated from unstructured data. The unstructured datatranslated into word tokens 602 may originate in Adversarial Tactics,Techniques & Common Knowledge (ATT&CK) descriptions that describe attackor intrusion sequences from intrusion logs.

The transform network 505 may also receive data objects with mitigationsteps 603 that are parsed from exploit reports, intrusion reports, orimported from a database of mitigation techniques and patches. Inaddition, the transform network 505 may receive data objects from thejoint latent space 506 and use feedback to add embeddings and improvethe data objects. Specifically, the transform network may receive thepre-exploit descriptions (e.g. exploited system configuration) andpost-exploit descriptions (e.g. recovery method, logs, or isolationmethod) as textual descriptions, parsed textual descriptions, or encodedtext. The mitigation steps 603 may also be generated from unstructureddata or information from the vulnerability management system. Thetransform network 505 may be a non-linear or recursive processor of thedata objects or textual information.

The context encoder 503 includes word tokens 701 inputted to the systemfrom the features and descriptions 601. The word tokens 701 aregenerated by a word parser (e.g. word2vec) that converts naturallanguage to word strings or tokens 701 that are in are arranged in anarray as shown in FIG. 7A. The word tokens 701 are input into a bi-LongShort Term Memory model 702 that outputs context labels 703 to the firstcombination node. The bi-Long Short Term Memory model 702 is based on anartificial recurrent neural network architecture with feedback toprocess sequential streams of tokens into labels and/or embeddings.

The label encoder 504 receives word token embeddings 705 andcharacter-based token embeddings and inputs the embeddings into anotherbi-Long Short Term Memory (LSTM) model 706. The embeddings include wordand character tokens 602 that are derived from descriptions of intrusiontechniques (e.g. ATT&CK stages). The label encoder 504 may apply aparser (e.g. word2vec) to the inputs to convert the data to embeddingsor vectors. The LSTM model outputs a label 704 to the first combinationnode and the second combination node. The labels 704 are output to eachcombination node for improved embeddings and similarity analysis at thenodes.

A single layer of the transformer network 505 includes at least twofeedback loops and connections to other layers of the transformernetwork 505. Each loop includes a stage to add bias and normalize 707the labels. Layer normalization, LayerNorm(x+Sublayer(x)), is also usedafter each sublayer, where Sublayer(x) denotes the sub-layer function.In addition, a first loop includes a multi-head self-attention stage 708that identifies similarity between newly coded labels 704 and mitigationsteps. In the transform network 505, each key, query, and value may be avector corresponding to a sentence. The transformer network 505 may alsoreceives tokens with mitigation techniques, patches, or protectionprotocols. In some designs, the output of the transformer module is anembedding vector. The transformer block captures the context of the CVEwith respect to mitigations and exploit steps. This helps to improve thelabeling of other heads and also handles cases of missing data in otherheads (i.e when data is not sparse in textual descriptions of exploits).

A self-attention stage such as stage 708 computes a new value for eachvector by comparing it with all vectors (including itself).Additionally, a multi-head transform as in stage 708 transforms an arrayof vectors and then applies attention to teach head before performing afinal transformation. In addition to attention sub-layers, in somedesigns, each of the layers in the encoder and decoder of the transformnetwork 505 contains a fully connected feed-forward network 709, whichis applied to each position separately and identically. Each layer ofthe transformer network may include a position-wise feed-forwardsub-layer 709 that compares across positions of a vector array andpasses input through one or more layers of neural networks beforeoutput. Residual connections may be maintained across layers orsublayers for easy passage of information through a deep stack oflayers.

In some designs, the model architecture of the labelling and filteringpipeline 600 may be adapted to encode labels from unstructured data.These labels are then fed into the joint latent space 506. Newlydiscovered CVEs may also be run through the context encoder side of thelabelling and filtering pipeline 600 starting with their features anddescriptions 601 being input to the context encoder 503. The labelsoutput by the context encoder 503 for the new CVEs are passed on to thejoint latent space for further analysis. In this sense, the whole modelarchitecture functions both to create a model trained with machinelearning but also to ingest new information and contextualizing it.

FIG. 8 illustrates for a process of creating the model from variousunstructured inputs according to an embodiment of the disclosure. In anexample, the process of FIG. 8 may be implemented via a vulnerabilitymanagement system, such as the vulnerability management system 250 ofFIG. 2 .

At 802, the system obtains at least one first textual description of oneor more features associated with a first vulnerability that has beenused in one or more attacks. The textual description inputted during themodel building process may be more detailed than a simple CVEdescription. The textual description may be one or more documents.Preferably, in some designs, the textual description is an exploitreport or intrusion log that details the use of the first vulnerabilityin the wild. An example textual descriptions of features of a particularCVE are depicted in Table 1, as follows:

TABLE 1 CVE Textual Description Example Feature Name Example DescriptionMITRE User Execution, Exploitation of Remote Services, Techniques SpearPhishing CVE number CVE-2017-8759 CVE Microsoft .NET framework 2.0, 3.5,and 3.5.1 description allows an attacker to execute code remotely via amalicious document or application Attack Document based: “An attackercrafts a malicious Sequence document to leverage the remote execution”Application based: “An attacker constructs a malicious .NET applicationand uploads it to a network device” Mitigations and Web users should becautious following links to controls for sites provided by unfamiliarsources, filter HTML defense from emails, deploy intrusion detectionsystem to monitor network traffic. High level Authentication: notrequired; Availability: User features initiated; Vendor: Microsoft;Classification: input validation error.

Examples of various mitigation techniques that may be part of thetextual description of a particular CVE are as follows:

TABLE 2 Mitigation Technique Descriptions Mitigation Category MitigationStrategy Restrict/Deny Do not follow links provided by unknown oruntrusted sources. Block external access at the network boundary, unlessexternal parties require service. Do not accept or execute files fromuntrusted or unknown sources. Evaluate and Set web browser security todisable the execution of Fix Default script code. Implement multipleredundant layers of Config security. Set web browser to disable theexecution of Javascript. Implement Do not allow untrusted users physicalaccess to Physical systems. Limit access to sensitive data or removableSecurity media. Allow only trusted individuals in range of WAN.Implement Communicate through secure means or encryption. SecureCommunication Channel Inspect and Deploy network intrusion monitoring.Filter malicious filter network network data. Review logs for moreinformation. traffic data Use Strong Implement multiple authenticationmechanism. Use Authentication strong passwords. Use of Least Run allsoftware as non-privileged users with minimal Privilege access rights.Limit privileges to minimal needed.

At 804, the system parses text from the at least one first textualdescription in accordance with one or more rules. The rules may includeselecting certain nouns, pronouns, verbs, and/or abbreviations from thetextual description. The rules may include selecting words based onproximity to a named CVE or other keyword. The rules may includeselecting or separating words based on whether the words precede akeyword or follow a keyword. The rules may be adapted for variouslanguages. The parsing may include filtering and vectorizing the words.The resultant parsed text (e.g., after filtering, vectorizing, etc.) isreferred to herein as a CVE “context”. Accordingly, reference to theparsed text may refer to the literal parsed text, or alternatively aprocessed version of the parsed text.

At 806, the system determines at least one first label for the firstvulnerability that is associated with one or more of a plurality ofstages of an attack chain taxonomy. The system determination may bebased on filtering, similarity scores, entropy, proximity, frequency, orother selection options. The determined labels may be embeddings orvectors. As will be described below in more detail, the label(s) may bedetermined based on a degree to which (or distance between) the CVEcontext(s) for a vulnerability or exploit are similar (or dissimilar) toa respective attack stage “concept”. An attack stage concept maycorrespond to a textual representation of the attack stage, as will bedescribed in more detail below.

Examples of attack stage types are as follows:

TABLE 3 Attack Stage Types Attack Stage Types Injection Remote/localcode, Command, HTML, OS Command, PHP Code, PHP Object, XML ExternalEntity File Based Access, read, write, delete, upload, Remote/localinclude, Temporary/arbitrary creation, insecure file permissions BypassAccess, Authentication, Authorization, brute force, hard codedcredentials, man in the middle URI processing Session Fixation,hijacking, manipulation, weak management Credentials Hard coded ordefault credentials, misconfiguration, predictable random number, weakpassword encryption, certificate spoofing Entry Document based, emailbased, application based, click jacking, request based Escalation Nullpointer de-reference, overflow, heap based overflow, integer overflow,stack overflow, memory corruption

Examples of concepts that are extracted (or derived) from an ATT&CKtaxonomy are as follows:

TABLE 4 Concept Extractions from Attack Stage Textual Description ATT&CKTechnique Concepts Extracted Valid Accounts Default accounts, adminaccount, unauthorized creation of user accounts, default-accounts,predictable credentials Virtualization/ Sandbox process, sandboxrestrictions, sandbox sandbox Evasion protections, bypass sandboxprotection Web service Malicious web service Web Shell Shell upload,upload and execute arbitrary script Winlogon Helper Unauthorizedexecution of DLL, creates DLL malicious DLL Spearphishing Distributesthe page and entices the user, phishing Attachment Steal Web Cookietheft, weak random session, malicious session cookie cookie, sessionimpersonation, cookie guessing, session hijacking System Network Manualscanning, port scanning, leaks protected Connection network Discovery

At 808, the system generates or refines a model that maps the parsedtext to at least one first label associated with the one or more stagesof the attack chain taxonomy. The model may include the joint latentspace 506 and MLP classifier 507. The mapping may include arranging orscoring labels in the joint latent space 506 based on relevance, attacktiming, or mitigation. In an example, new CVE descriptions becomeavailable frequently, whereas the attack chain taxonomy and associatedconcepts may change less frequently. The above-noted model may generallybe applied with respect to new CVE description in a predictive manner soas to label the new CVE with regard to labels that are associated withone or more attack stages of a respective attack stage taxonomy, such asATT&CK.

The model that is refined and/or generated in the final step of FIG. 8may include the joint latent space 506 and the MLP classifier 507. Thejoint latent space 506 contains labels of different sizes from each ofthe embeddings modules. The MLP classifier 507 can then operate onlabels of all different sizes and select sets of labels based on aninput to the MLP classifier 507. Specifically, samples from two domainsor sources such as the feature domain of CVE and corresponding ATT&CKdomain are projected into the joint latent space which captures thestructure of the labels, the encoded texts and the interactions betweenthe two. Then the MLP classifier operates on the joint latent spacewhich is independent of the label set size. The resulting model has thefollowing properties: (i) Each head (i.e. encoders 503, 504, 505) of theMLP classifier of the model learns the label dependency from theattacker, defender, and CVE metadata point of view; (ii) Making thejoint latent space 506 dimension independent of label size, such thatinput feature dimensions help the model to discover un-seen labels, and(iii) The model is trained as in FIG. 8 with cross-entropy loss andsigmoid function which is suitable for the multi-label variable-sizeclassification problem.

The process 800 as illustrated in FIG. 8 is described in more detail inrelation to FIG. 10-11 . In particular, more structural detail is tiedto various stages of the process. In addition, exemplary mathematicalequations are described for performing one or more of the steps. Ingeneral, the system such as that illustrated in FIG. 1-6 may perform amethod 800 that ultimately generates and refines a model forcharacterizing and prioritizing CVEs. The inputs may be textualdescriptions of CVEs, exploit reports, mitigation techniques, ATT&CKdescriptions, and intrusion logs that give a detailed description of howCVEs have been used in the wild.

In FIG. 9 a number of steps of process 900 are provided that execute toprocess data through the model built previously according to oneimplementation of the system. In particular, a system such as thatillustrated in FIG. 1-6 may perform a method that characterizes andprioritizes one or more CVEs based on their description. Indeed, thesteps may add or predict new, unknown, or previously unobserved featuresof the CVE. These added features may assist the vulnerability managementsystem 250 in selecting which CVEs to address and how to recognize theuse of a CVE in an attack sequence.

In particular, the system, at 902, may obtain at least one textualdescription of one or more features associated with a vulnerabilityand/or exploit. This textual description may be a CVE description, zeroday description, exploit code sample, or other vulnerabilitydescription. The textual description may be contained in one or moredocuments and may be derived from the active or passive scanners 110 and120 or the vulnerability management system 250.

The system, at 904, may parse text from the at least one textualdescription in accordance with one or more rules. The rules may includeselecting certain nouns, pronouns, verbs, and/or abbreviations from thetextual description. The rules may include selecting words based onproximity to a named CVE or other keyword. The rules may includeselecting or separating words based on whether the words precede akeyword or follow a keyword. The rules may be adapted for variouslanguages. Labels may be based on the words of the parsed text, inferredfrom the parsed text, or generated based on correlation with the parsedtext. The system may perform filtering, similarity scores, entropy,proximity, frequency, or other selection options on the labels or parsedtext. The labels may be embeddings or vectors.

At 906, the system obtains a model that maps textual data to labels forthe one or more features of the vulnerability and/or exploit torespective stages of an attack chain taxonomy. The model may include thejoint latent space 506 and MLP classifier 507. The mapping may includearranging or scoring labels in the joint latent space 506 based onrelevance, attack timing, or mitigation.

At 908, the system maps the parsed text to at least one first label forthe first vulnerability associated with one or more stages of the attackchain taxonomy in accordance with the model. That is, the system runsthe acquired vectorized text for the vulnerability or CVE through themodel to predict the stage(s) in the attack chain in which the CVE maypotentially be used.

It should be noted that only the description of the CVE needs to beinputted to the process 900 for stages of the attack chain to bedetermined or mapped to the CVE. Therefore, the model is adding newknowledge, not merely processing exploit reports. Thus, CVEs which haveyet to be used can be prioritized with far more knowledge andinformation by the vulnerability management system 250. In particular,the MLP classifier 507 utilizes the encoded labels to discover newconnections, concepts, and labels from amongst the three dimensions ofthe joint latent space 506.

FIG. 10-11 illustrate example implementations of the processes 800-900of FIGS. 8-9 in accordance with one or more aspects of the presentdisclosure. In particular, FIGS. 10-11 illustrate in more detail thetiming, structure, and processes of the process 800 that builds themodel, specifically the system that fills the joint latent space withlabels and improves those labels. FIG. 10 illustrates a first portion ofthe process beginning with the source information 1001 and ending withthe preliminary mapping modules 1003 which correspond to the combinationnodes of FIG. 6 . Then FIG. 11 illustrates a second portion of thesystem beginning with the preliminary mapping modules 1003 and endingwith the MLP classifier 507.

Because very little structured, tagged, embedded or labeled data isavailable to describe CVEs, the system takes as inputs attackdescriptions 1004 including CVE descriptions (as in 802), exploitreports 1005 with ATT&CK or attack stage descriptions, and attackmitigation steps 1006. These unstructured datasets may be raw text, amarkup (e.g. XML), or other text format. The unstructured data is parsedinto tokens or vectors by a natural language processor (e.g. word2vec)as part of 804 before being passed to the context encoder 503, the labelencoder 504, and the transformer network 505.

The parsing of the unstructured data may begin with a word to vectorencoder that identifies key terms or words (nouns in particular) andextracts surrounding descriptors to form a vector. According to oneimplementation, some or all nouns and verb phrases are extracted ascandidates from the CVE descriptions. For each candidate, some or allwords within the phrase and a window of N context words to each side ofthe phrase are used, the window being an implementation of the one ormore rules of step 804. In particular, three separate sequences of wordsmay become vectors: left context, the phrase, and the right context. Forthe labelling process to be accurate the similarity/dissimilaritybetween ATT&CK technique concepts and CVE description phrases orcontexts is measured using various distance functions which may be theone or more rules of 804.

For example, below is a sentence from a threat report for CVE-2017-8759highlighting left and right contexts around the phrase “CVE-2017-8759exploit”. “The [left context]malicious document[left context] containingCVE-2017-8759 exploit, [right context]downloads multiplecomponents[right context], and eventually launches a FINSPY payload.”The three vectors (right, phrase, left) may be composed again using anelement-wise mean, c (Wl;Wp;Wr)=mean (mean (Wl); mean (Wp); mean (Wr))The components of candidate vector c are the mean components of thewords in the left context Wl, the words in the phrase Wp, and the wordsin the right context Wr.

For each of the parsing and encoding (embedding) modules 1002, theparsed text may be further processed through a bi-LSTM network, LSTMnetwork, or another artificial recurrent neural network as a part of804. The token embedding layer of the context encoder 503 takes a tokenas input and outputs its vector representation, given an input sequenceof tokens x1 . . . xn, the output vector ei (i=1 . . . n) of each tokenxi results from the concatenation of two different types of embeddings:token embeddings Vt(xi) and the character-based token embeddings (bi)that come from the output of a character-level and word-level bi-LSTMencoder. Features that have less contextual information but may containout of vocabulary (00V) tokens also pass through the token embeddinglayer to the joint latent space 506.

Likewise, label embeddings in the label encoder 504 are generated from aword to vector system with the LSTM model and are derived from theattack/intrusion techniques (e.g. ATT&CK) ymi=y1; y2 . . . yn andcorresponding descriptions. Once the labels (concept labels) aregenerated by the label encoder 504, the labels are sent to thecombination node 1007 where they are filtered and combined with theoutput of the context encoder 503. This combination of attack techniquewith the CVE or mitigation may be a part of the determining process of806. The combination node 1007 may utilize a type of distance functionsuch as cosine similarity, Fisher linear discrimination, L² (Euclidian)distance, Maximum Mean Discrimination (MMD), and other correlationfunctions.

The cosine similarity of concepts extracted from the ATT&CK or intrusiontechniques and the phrases from threat reports may be given by: sim(phrase, concept)=(phrase·concept)/|phrase|₂·|concept|₂. Cosinesimilarity measures nearness of the phrase to the concept to assignlabels. Finally, the following assign function labels a given phrase(CVE) from threat report with relevant technique labels.

assign(phrase)=arg max sim(phrase,c)

The techniques having the highest cosine similarity with the phrases areassigned the label as the technique which is likely to be used with theCVE as in 808.

The joint latent space 506 between two ATT&CK techniques domains and theCVE feature domain is created by a component-wise multiplication of eachembedding type with label embedding for their joint representation givenby:

h _(Aj) ^((ij)) =h _(i) ^(y) ·h _(i) ^(A) and h _(Mj) ^((ij)) =h _(i)^(y) ·h _(i) ^(M).

where h_(i) ^(y) is the label embedding, h_(i) ^(A) is the mitigation ortransform embedding, and h_(i) ^(M) is the context embedding. Theprobabilities for each are calculated as: p_(A) ^((ij))=h_(Aj)^((ij))ω_(A)+b_(A) and p_(M) ^((ij))=h_(Mj) ^((ij))ω_(M)+b_(M). Theprobability for h belongs to one of the k known labels and is modeled bya linear unit that maps any point in the joint space into a score whichindicates the validity of the combination, where ω∈

^(dj) and b are scalar variables and di is the number of CVE exploitsinput for training. Therefore, the h_(Aj) ^((ij)) dot product orcomponent-wise multiplication is an implementation of combination node1009 and the h_(i) ^(M) dot product or component-wise multiplication isan implementation of combination node 1007.

Finally, the output of combination node 1007 and combination node 1009combined with the probabilities creates a multi-dimensional joint latentspace 506 or model where attack chain description labels are mapped toCVE description labels and mitigation labels as in 808. These mappedlabels are joined in a single joint latent space 506 via dot product orcombination node 1008 as illustrated in FIG. 10 and FIG. 11 . Theresulting joint space has an independent label dimension.

A training set with N samples is given as Dtr={(xi,ymi), i=1, . . . ,N},with xi={xdi,xsi,xci,xti} where xdi is textual description of CVE, xsirepresents sequence of steps to exploit the CVE, xci denotes mitigationsteps and controls needed to reduce the attack surface for the CVE, xtirepresents the high level characteristics of the CVE namely CPE, CVSSbase and temporal strings, CWE, classification of the CVE, credibility,local vs remote CVE, severity, and ymi denote the corresponding ATT&CKintrusion techniques represented as ymi=y1, y2 . . . yn for the samplexi.

The system categorizes the first vulnerability into one or more of aplurality of stages of an attack chain taxonomy. The categorization whenbuilding the model may be a manual process based on the exploit orintrusion report. The categorization may be automated based on importedattack techniques, parsing of the exploit report, and/or learning fromother labels and categories. The system may use a past categorization orreceive a categorization of the first vulnerability into the attackchain taxonomy (e.g. ATT&CK).

For a pre-trained token embedding, a word to vector coder (e.g.word2vec) may be trained with a window size of 8, a minimum vocabularycount of 1, and 15 iterations. The negative sampling number is set to 8and the model type may be skipgram. The dimension of the output tokenembedding is set to 300. The transformer network may be configured with2 transformer blocks, with hidden size of 768 and a feed-forwardintermediate layer size of 4×768, i.e., 3072, the hidden size relatingto hidden layers of the feed forward neural network. The 768-dimensionalrepresentation obtained from the transformer is pooled by the decoderwhich is a five-layer feed-forward network with rectified linear unit(ReLU) nonlinearity in each layer with a hidden size of 200, and a300-dimensional output layer for the embedding.

An implementation of the labeling and filtering pipeline 600 on a sampleof 62,000 CVE records identified the attack sequences below (withcount >100). As can be noted, these attack sequences are far moredetailed than the four attack sequences/categories of the ATT&CKframework and those techniques listed Tables 3 and 4:

TABLE 5 Discovered Attack Sequences (record count > 100 reported) CVErecord Attack Sequences count Steal web session cookie, web sessioncookie 103 Spearphishing attachment 106 Exploit public facingapplication, exploitation 117 for defense evasion, file and directorypermissions modifications Command line interface, exploitation of remote119 services Install root certificate 129 Exploitation for defenseevasion, web shell 130 Shortcut modification, taint shared content 143Exploit public-facing application, user execution 144 Command lineinterface 152 File and directory permissions modification 180 CompiledHTML file, Exploit public-facing 193 application Exploit public-facingapplication, Exploitation 258 for defense evasion Exploit public-facingapplication, Spearphishing 277 attachment User Execution 311Exploitation for defense evasion, Spearphishing 385 attachmentExploitation for defense evasion, exploitation of 409 remote services,User Execution Command line interface, Exploit public-facing 429application, Account Manipulation 430 Exploitation for defense evasion,Install root 789 certificate Exploitation for defense evasion,Exploitation for 1004 privilege escalation Exploitation for defenseevasion, User Execution 1120 Exploitation for privilege escalation 1609Exploitation for defense evasion, Exploitation of 2174 remote servicesExploitation of remote services 3994 Exploit public-facing application7833 Exploitation for defense evasion 11108

In FIG. 11 , the combination nodes 1007 and 1009 are shown feeding thecentral mapping node 1008 which builds the joint latent space 506 housedin the label database 1101 of the data manipulator 502 as part of 808.The MLP classifier 507 is trained on the joint latent space 506. Inparticular, the classifier is trained with a binary cross-entropy lossand apply the sigmoid function:

${\overset{\hat{}}{y}}_{i} = {{p\left( y_{i} \middle| x_{i} \right)} = \frac{1}{1 + e^{- P_{val}^{(i)}}}}$

Given the sample input xi and the associated labels ymi, the trainedclassifier is able to predict labels both in the seen, Ys, or unseen,Yu, label sets, defined as the sets of unique labels which have beenseen or not during training respectively and, hence, Y∩Yu=Ø; andY=Ys∪Yu. The newly discovered labels may be mapped to CVEs or used toidentify new uses for CVEs or new mitigation techniques for CVEs.Alternatives for training include exponential linear units, rectifiedlinear units, scaled exponential linear units, Gaussian error linearunits, and leaky rectified linear units.

This novel label discovery is performed by label discovery engine 1105as part of the refining feature of 808. There are only four ATT&CKtechniques (Exploit Public-Facing Application, Exploitation for ClientExecution, Exploitation for Privilege Escalation, Exploitation of RemoteServices) which cover the exploitation phase of the attack chain butthere are no more granular categories that can be mapped. Accordingly,the MLP classifier 507 utilizes label mappings engine 1104 and labeldiscovery engine 1105 to add more granularity to the intrusiontechniques in 808. Old CVEs which were assigned to an ATT&CK techniquecan be reassigned to a new technique based on the evolution of attackersmethods over time. New techniques, CVEs, attack scenarios, andmitigations are constantly added to combat new threats and the old modelstill has to work with new concept drift data.

The system enriches CVEs with a curated knowledgebase of 150 attackscenarios for exploiting vulnerabilities and 50 mitigation strategieswhich help the model to learn both attacker and defender view of a givenCVE. The system was tested with a dataset containing CVEs disclosed fromthe past 10 years and compared with standard baseline models andablation analysis. Using the resulting model, 62,000 CVE records weremapped to different ATT&CK techniques and identified 135 unique attacksequences (an attack sequence can be viewed as a set of one or more ofATT&CK attack techniques assigned to one CVE record).

Various models including BI-LSTM, Attention-based BI-LSTM, andTD-IDF-based SVM multi-label classifiers may also be used as theclassifier. The term frequency-inverse document frequency (TF-IDF)approach represents all textual features as vectors with the same lengthas the vocabulary of the entire text corpus. For the termfrequency-inverse document frequency (TF-IDF) model, each entry in thevector corresponds to a unique word, and its weight gives the frequencyof that word in the post divided by its document frequency. Thesedocument vectors are then used in the classification task. Also sinceTF-IDF results in high-dimensional representations, a support vectormachine (SVM) is applied on the TF-IDF features. In testing, the MLPclassifier 507 operating on the three filtered and combined labeldomains generates the best results.

The efficiency of the various models may be given by their correlationscores, which are provided in Table 6 below for various baselines (P@1,P@3, P@5):

TABLE 6 Model performance for various baselines Model P@1 P@3 P@5Bi-LSTM + MLP 0.8557 0.8223 0.838 Attention-Based Bi-LSTM + MLP 0.87570.8234 0.848 TD-IDF + SVM 0.7619 0.6246 0.686 Proposed Model 0.93160.9589 0.945

Likewise, the model run with ablation testing for various layercombinations of the model had the following efficiency scores:

TABLE 7 Ablation test of various layers/encoders of the model LabelsLayer P@1 P@3 P@5 hM + MLP 49.84% 32.27% 24.17% hA + MLP 70.40% 54.98%44.86% (h^(A) · h^(M)) + MLP 85.28% 61.12% 52.78% (h^(A) · h^(M) ·h^(y)) + MLP 93.16% 95.89% 94.50%

The model developed in process 800 and FIG. 10-11 may then be used,applied, or executed for a new CVE outside the training set as shown inFIG. 12 and process 900. Specifically, the system receives unstructuredvulnerability information inputs 1201 to begin the process as in 902.These inputs 1201 may include CVE descriptions 1204 and zero day reportsor code 1205 pertaining to a CVE outside the training set that thevulnerability management system 250 needs prioritized or characterized.Notably, the CVE or vulnerability may not yet have been used ordescribed as part of an attack chain. The unstructured inputs 1201 maythen parsed by a word-to-vector natural language processor into tokensas part of 904 (other word parsers may be used).

The context encoder 503 receives the tokens and performs severalfiltering operations using a bi-LSTM model or other filtering engine.The filtering results of context encoder 503 or embedding module 1202determine or select of one or more labels at 904 that characterize theCVE. These labels are then passed to the MLP classifier 507 which placesthem in the joint latent space 506 as part of the label mapping 908. Thecontext encoder 503 may also output to a combination node where thecontext labels from the CVE are matched with attack techniques from alabel encoder 504 as part of 908.

A mapping in the form of a heat map of MITRE ATT&CK tactics andtechnique produced by the proposed model for CVE dataset is shown inFIGS. 13A-13B. The numbers in each cell correspond to CVE count for aparticular tactic and technique. The basic CVE descriptions when mappedto corresponding ATT&CK techniques and tactics can help defenders tocorrectly assess the risk and understand at which stage of the attackcycle the corresponding CVEs are being used. The MLP classifier 507 maythen operate on the labels to place the labels in the appropriate placein the joint space 506. In particular, the MLP classifier 507 may assignattack stages and mitigation strategies from those two feature/labeldomains of the joint latent space 506. In addition to selecting the bestattach technique and mitigation strategy labels from the joint latentspace 506, the MLP classifier 507 also feeds the new labels received forthe mapped CVE into the joint latent space 506 to improve and refine themodel, which may be part of 908, as shown by the double headed arrow.

The MLP classifier 507 or the larger data manipulator 502 and computingdevice 501 may then transmit the resulting characterization, attackchain taxonomy, and mitigation strategies to the vulnerabilitymanagement system 150/250 or database(s) 508. The vulnerabilitymanagement system 150/250 may then perform the mitigation strategies orinstruct one of the other system elements (110, 120, 130, or 140) toimplement all or part of the mitigation strategies. The mitigationstrategy may include port blocking, patching, code scanning, packetscanning, or other prevention measures for a known CVE. Thevulnerability management system 150/250 may also apply thecharacterization from the MLP classifier 507 to prioritize the fixing ofthe CVE relative to other CVEs based on other system information.

Those skilled in the art will appreciate that information and signalsmay be represented using any of a variety of different technologies andtechniques. For example, data, instructions, transmissions, commands,information, signals, bits, symbols, and chips that may be referencedthroughout the above description may be represented by voltages,currents, electromagnetic waves, magnetic fields or particles, opticalfields or particles, or any combination thereof.

Further, those skilled in the art will appreciate that the variousillustrative logical blocks, modules, circuits, and algorithm stepsdescribed in connection with the aspects disclosed herein may beimplemented as electronic hardware, computer software, or combinationsof both. To clearly illustrate this interchangeability of hardware andsoftware, various illustrative components, blocks, modules, circuits,and steps have been described above generally in terms of theirfunctionality. Whether such functionality is implemented as hardware orsoftware depends upon the particular application and design constraintsimposed on the overall system. Skilled artisans may implement thedescribed functionality in varying ways for each particular application,but such implementation decisions should not be interpreted to departfrom the scope of the various aspects and embodiments described herein.

The various illustrative logical blocks, modules, and circuits describedin connection with the aspects disclosed herein may be implemented orperformed with a general purpose processor, a digital signal processor(DSP), an application specific integrated circuit (ASIC), a fieldprogrammable gate array (FPGA) or other programmable logic device,discrete gate or transistor logic, discrete hardware components, or anycombination thereof designed to perform the functions described herein.A general purpose processor may be a microprocessor, but in thealternative, the processor may be any conventional processor,controller, microcontroller, or state machine. A processor may also beimplemented as a combination of computing devices (e.g., a combinationof a DSP and a microprocessor, a plurality of microprocessors, one ormore microprocessors in conjunction with a DSP core, or any other suchconfiguration).

The methods, sequences, and/or algorithms described in connection withthe aspects disclosed herein may be embodied directly in hardware, in asoftware module executed by a processor, or in a combination of the two.A software module may reside in RAM, flash memory, ROM, EPROM, EEPROM,registers, hard disk, a removable disk, a CD-ROM, or any other form ofnon-transitory computer-readable medium known in the art. An exemplarynon-transitory computer-readable medium may be coupled to the processorsuch that the processor can read information from, and write informationto, the non-transitory computer-readable medium. In the alternative, thenon-transitory computer-readable medium may be integral to theprocessor. The processor and the non-transitory computer-readable mediummay reside in an ASIC. The ASIC may reside in an IoT device. In thealternative, the processor and the non-transitory computer-readablemedium may be discrete components in a user terminal.

In one or more exemplary aspects, the functions described herein may beimplemented in hardware, software, firmware, or any combination thereof.If implemented in software, the functions may be stored on ortransmitted over as one or more instructions or code on a non-transitorycomputer-readable medium. Computer-readable media may include storagemedia and/or communication media including any non-transitory mediumthat may facilitate transferring a computer program from one place toanother. A storage media may be any available media that can be accessedby a computer. By way of example, and not limitation, suchcomputer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or otheroptical disk storage, magnetic disk storage or other magnetic storagedevices, or any other medium that can be used to carry or store desiredprogram code in the form of instructions or data structures and that canbe accessed by a computer. Also, any connection is properly termed acomputer-readable medium. For example, if the software is transmittedfrom a website, server, or other remote source using a coaxial cable,fiber optic cable, twisted pair, DSL, or wireless technologies such asinfrared, radio, and microwave, then the coaxial cable, fiber opticcable, twisted pair, DSL, or wireless technologies such as infrared,radio, and microwave are included in the definition of a medium. Theterm disk and disc, which may be used interchangeably herein, includesCD, laser disc, optical disc, DVD, floppy disk, and Blu-ray discs, whichusually reproduce data magnetically and/or optically with lasers.Combinations of the above should also be included within the scope ofcomputer-readable media.

While the foregoing disclosure shows illustrative aspects andembodiments, those skilled in the art will appreciate that variouschanges and modifications could be made herein without departing fromthe scope of the disclosure as defined by the appended claims.Furthermore, in accordance with the various illustrative aspects andembodiments described herein, those skilled in the art will appreciatethat the functions, steps, and/or actions in any methods described aboveand/or recited in any method claims appended hereto need not beperformed in any particular order. Further still, to the extent that anyelements are described above or recited in the appended claims in asingular form, those skilled in the art will appreciate that singularform(s) contemplate the plural as well unless limitation to the singularform(s) is explicitly stated.

What is claimed is:
 1. A method of semantic model training, comprising:obtaining at least one first textual description of one or more featuresassociated with a first vulnerability that has been used in one or moreattacks; parsing text from the at least one first textual description inaccordance with one or more rules; determining at least one first labelfor the first vulnerability that is associated with one or more of aplurality of stages of an attack chain taxonomy; and generating orrefining a model that maps the parsed text to the at least one firstlabel associated with the one or more stages of the attack chaintaxonomy.
 2. The method of claim 1, wherein the at least one firsttextual description comprises an intrusion or exploit report, aproof-of-concept, or a zero-day report, or wherein the at least onefirst textual description includes an adversarial tactics, techniquesand common knowledge (ATT&CK) description, a mitigation techniquedescription, a patch description, a description of a sequence of stepsfor exploit, a rating-level characterization of a vulnerability, avulnerability description, or any combination thereof.
 3. The method ofclaim 1, further comprising: inserting the at least one first label intoa joint label space; inserting at least one second label related to oneor more intrusion techniques into the joint label space; generating atleast one technique label based on labels in the joint label space,wherein the determination of the at least one first label for the firstvulnerability is based on context extracted from the parsed text,wherein the generating of the at least one technique label is based on adistance function between the at least one second label and the at leastone first label.
 4. The method of claim 1, wherein the generating orrefining of the model comprises execution of a machine learning processthat maps the parsed text and/or the at least one first label to the oneor more stages of the attack chain taxonomy, and wherein a classifier istrained to map text parsed from a vulnerability description to the oneor more stages of the attack chain taxonomy.
 5. The method of claim 1,further comprising: obtaining at least one second textual description ofone or more additional features associated with a second vulnerability;parsing text of the second textual description in accordance with theone or more rules; generating or determining at least one second labelfor the second vulnerability from the text parsed in accordance with theone or more rules; and mapping the at least one second label to at leastone stage of the attach chain taxonomy based on the model.
 6. The methodof claim 1, wherein the generating or refining includes: generatinglabels of a joint label space by a multi-label text classification modelhaving at least two label encoding heads.
 7. The method of claim 6,wherein a first head of the at least two label encoding heads comprisesa context encoder that encodes vector representations of wordsassociated with the first vulnerability based on the parsed text,wherein a second head of the at least two label encoding heads is aconcept encoder that identifies the one or more stages of the attachchain taxonomy associated with the first vulnerability as labels basedon the parsed text.
 8. The method of claim 7, wherein a third head ofthe at least two label encoding heads encodes attacker actions andmitigation techniques.
 9. The method of claim 8, wherein an output ofthe first head and an output of the second head are combined andinserted into the joint label space, and wherein the output of thesecond head and the third head are combined and inserted into the jointlabel space.
 10. The method of claim 6, further comprising: training amulti-layer perceptron classifier via machine learning on the jointlabel space.
 11. A method, comprising: obtaining at least one textualdescription of one or more features associated with a vulnerabilityand/or exploit; parsing text from the at least one textual descriptionin accordance with one or more rules; obtaining a model that mapstextual data to labels for the one or more features of the vulnerabilityand/or exploit to respective stages of an attack chain taxonomy; andmapping the parsed text to at least one first label for the firstvulnerability associated with one or more stages of the attack chaintaxonomy in accordance with the model.
 12. The method of claim 11,wherein a classifier operates on a joint latent space of the model, theclassifier assigning labels to the vulnerability and/or exploit from alabel set of the joint latent space.
 13. The method of claim 12, whereina size of the label set is independent of the joint latent space. 14.The method of claim 11, wherein after training, a classifier predictslabels for the vulnerability and/or exploit based on the parsed text,the labels being derived from a first label set of a joint latent spaceobserved during training and a second label set of the joint latentspace that was not observed during training.
 15. The method of claim 11,wherein the one or more rules comprise: a rule for selecting certainnouns, pronouns, verbs, and/or abbreviations from the at least onetextual description, a rule for selecting words based on proximity to anamed instance of the vulnerability, a rule selecting or separatingwords based on whether the words precede a keyword or follow a keyword,or any combination thereof.
 16. An apparatus, comprising: a memory; andat least one processor coupled to the memory and configured to: obtainat least one textual description of one or more features associated witha vulnerability and/or exploit; parse text from the at least one textualdescription in accordance with one or more rules; obtain a model thatmaps textual data to labels for the one or more features of thevulnerability and/or exploit to respective stages of an attack chaintaxonomy; and map the parsed text to at least one first label for thefirst vulnerability associated with one or more stages of the attackchain taxonomy in accordance with the model.
 17. The apparatus of claim16, wherein a classifier operates on a joint latent space of the model,the classifier assigning labels to the vulnerability and/or exploit froma label set of the joint latent space.
 18. The apparatus of claim 17,wherein a size of the label set is independent of the joint latentspace.
 19. The apparatus of claim 16, wherein after training, aclassifier predicts labels for the vulnerability and/or exploit based onthe parsed text, the labels being derived from a first label set of ajoint latent space observed during training and a second label set ofthe joint latent space that was not observed during training.
 20. Theapparatus of claim 16, wherein the one or more rules comprise: a rulefor selecting certain nouns, pronouns, verbs, and/or abbreviations fromthe at least one textual description, a rule for selecting words basedon proximity to a named instance of the vulnerability, a rule selectingor separating words based on whether the words precede a keyword orfollow a keyword, or any combination thereof.